Development of hypertension models for lung cancer screening cohorts using clinical and thoracic aorta imaging factors

This study aims to develop and validate nomogram models utilizing clinical and thoracic aorta imaging factors to assess the risk of hypertension for lung cancer screening cohorts. We included 804 patients and collected baseline clinical data, biochemical indicators, coexisting conditions, and thoracic aorta factors. Patients were randomly divided into a training set (70%) and a validation set (30%). In the training set, variance, t-test/Mann–Whitney U-test and standard least absolute shrinkage and selection operator were used to select thoracic aorta imaging features for constructing the AIScore. Multivariate logistic backward stepwise regression was utilized to analyze the influencing factors of hypertension. Five prediction models (named AIMeasure model, BasicClinical model, TotalClinical model, AIBasicClinical model, AITotalClinical model) were constructed for practical clinical use, tailored to different data scenarios. Additionally, the performance of the models was evaluated using receiver operating characteristic (ROC) curves, calibration curves and decision curve analyses (DCA). The areas under the ROC curve for the five models were 0.73, 0.77, 0.83, 0.78, 0.84 in the training set, and 0.77, 0.78, 0.81, 0.78, 0.82 in the validation set, respectively. Furthermore, the calibration curves and DCAs of both sets performed well on accuracy and clinical practicality. The nomogram models for hypertension risk prediction demonstrate good predictive capability and clinical utility. These models can serve as effective tools for assessing hypertension risk, enabling timely non-pharmacological interventions to preempt or delay the future onset of hypertension.


Patients
This study was approved by the Ethics Committee of Wuhan Union Hospital ([2021]0853) in accordance with the Declaration of Helsinki.The research is retrospective and the data are anonymous, thus informed consent was waived by Wuhan Union Hospital Ethics committee.
From November 2018 to November 2019, patients who were hospitalized in the General Medical and Geriatrics Department were included.After excluding 348 patients who did not meet the criteria (see Fig. 1 for details), 804 patients were eventually included in the study, of whom 439 had hypertension and 365 were without hypertension.And blood pressure measurement and diagnosis of hypertension are carried out strictly by professional doctors according to 2018 ESC/ ESH Guidelines for the management of arterial hypertension 32 .The whole patients were then completely randomly divided into the training cohort and internal validation cohort by a ratio of seven to three.The flowchart presented the detailed procedures of the inclusion and exclusion criteria, blood pressure monitoring and diagnosis as well as grouping.
Baseline clinical data, including age, sex, height, weight, personal history, serum biochemical markers and other concomitant diseases were acquired from the electronic medical records system.

Thoracic aorta segmentation and features extraction
In this study, a multi-task learning framework (provided by Shanghai United Imaging Intelligence Co. Ltd., Shanghai, China) was used to automatically measure nine key positions of the thoracic aorta recommended by AHA guidelines 33 on chest CT images (Fig. 2, Supplementary Fig. S1).The framework is an automatic postprocessing tool for thoracic aorta utilized deep learning methods, which has been applied and validated on challenging NCCT images.
The automatic quantification of thoracic aorta mainly consists of three tasks: aortic segmentation, aortic anatomical marker detection and aortic measurement.Using two parallel subnets, the framework can simultaneously accomplish two tasks of aortic segmentation and thoracic aortic anatomical landmark localization.Specifically, the segmentation subnetwork was intended to delineate the thoracic aortic boundary, and the detection subnetwork was intended to detected five key anatomical landmarks, including the aortic sinus, brachiocephalic artery, left common carotid artery, left subclavian artery, and celiac trunk.Based on the segmented aortic mask and the five detected anatomical landmarks, the nine key landmarks recommended by the AHA guidelines can be inferred.Thus, we can calculate the diameters and cross-sectional areas of the nine landmarks as well as the total and segmental volume and length.More details and performance tests for each section can be found in the reference 34 .

Feature selection and development, validation of signature
A three-step procedure was performed for dimensionality reduction for thoracic aorta imaging factors.Firstly, thoracic aorta imaging factors with variance > 1.0 were selected.Secondly, Analysis of Variance was applied to choose the statistical influence feature for hypertension.Finally, the thoracic aorta imaging factors met the criteria of variance > 1.0 and being significantly different between non-HTN group and HTN group were enrolled into the least absolute shrinkage and selection operation (LASSO) regression method to select the most related features with non-zero coefficients from the training cohort.www.nature.com/scientificreports/After feature selection, the AIScore was computed for each patient through the LASSO regression with a combination of selected features weighted by their respective coefficients.Both feature selection and AIScore development were performed in the training cohort.And it was evaluated in the internal validation set.

Construction of different models
In order to meet the needs of different clinical application scenarios, we established five models with different features: AIMeasure model, BasicClinical model, TotalClinical model, AIBasicClinical model and AITotalClinical model.
AIMeasure model was constructed as the sum of weights of the selected thoracic aorta imaging factors using logistic regression.The clinical factors models were comprised of BasicClinical model and TotalClinical model, which were built using the directly available basic clinical data (including sex, age, height, weight, BMI, smoking history, drinking history, etc.) and the total clinical data (including the above basic clinical data, serum biomarkers and concomitant diseases, etc.), respectively.Concretely, we first applied univariate analysis to compare the differences of clinical factors between the two groups.Then the variables with significant statistical differences were inputted into the multivariate logistic backward stepwise regression to build the clinical factors models.Further, we established the AIBasicClinical model and AITotalClinical model by combining valuable clinical factors and AI-score using the multivariate logistic backward stepwise regression method.

Assessment of the performance of different models
The predictive performance of the five models for identifying hypertension was evaluated using receiver operating characteristic (ROC) curve in both the training set and validation set.At the same time, Delong test was used to compare the differences of AUC values among different models.The agreement between the predicted and actual probabilities of the models was appraised with a calibration curve.To assess the clinical applicability of the five models, a decision curve analysis (DCA) was carried out by calculating the net benefits.

Statistical analysis
The statistical analysis was performed using R software (version 4.0.3,https:// www.r-proje ct.org).And p < 0.05 was considered statistically significant.
The Shapiro-Wilks method was used for normal distribution test, and the Levene method was used for homogeneity of variance test.The independent-samples t-test (for normal distribution continuous variables), Mann-Whitney U-test (for non-normal distribution continuous variables) and chi-square test (for categorical variables) were used to compare the differences of clinical factors between two groups.Normally distributed variables were presented as mean ± standard deviation, non-normally distributed variables were expressed as median (25th and 75th percentile), and categorical variables were expressed as count (percentage).Difference of thoracic aorta imaging factors was compared using ANOVA analysis.
The "glmnet" package was used for standard LASSO regression.The "rms" package was used to perform the multivariate binary logistic regression and develop nomogram.The "pROC" package was applied to plot ROC curves and Delong test was used to estimate the differences of AUC values among different models.And calibration curves were plotted using the "rms" package, while the DCA was carried out using the "rmda" package.

Characteristics of patients
A total of 804 patients who met the inclusion criteria were finally enrolled in this study (median age was 52 years, male accounted for 30.6%),including 439 patients with hypertension (54.6%) (Fig. 1, Table 1).Detailed clinical features of the patients are presented in Table 1.None of the clinical features were statistically different between the training set and validation set.
In terms of basic clinical features, all the features including age, sex, height, weight, BMI, history of smoking and drinking were significantly different between HTN and non-HTN groups.And high blood pressure was associated with an older age, a greater BMI as well as a history of smoking and drinking.In addition, high blood pressure seems to favor women.
Concerning serum biochemical markers, HTN patients tend to have higher blood glucose and higher blood lipids.Besides, alanine aminotransferase (ALT), blood urea nitrogen (BUN), creatinine (CREA) and serum potassium (K) were also statistically different in HTN and non-HTN groups.
When it came to comorbidities, the number of patients who had hypertension with comorbidities was significantly higher than the non-HTN group.Hyperglycemia, hyperlipidemia, hyperuricemia, peripheral atherosclerosis, coronary atherosclerosis, cerebral atherosclerosis, lacunar cerebral infarction, fatty liver and other diseases tend to occur more in hypertensive patients.

Features extraction, selection and establishment of signature
In total, 43 features including the diameter and area of 9 levels of thoracic aorta, the volume and length at two adjacent levels, the volume and length of ascending aorta, aortic arch, descending aorta, and the total volume and length were obtained by AI on non-contrast enhanced chest CT (see Supplementary Table S1 and Fig. S1 for details).After excluding features with variance > 1, the remaining irrelevant redundant features were continued to be excluded by one-way ANOVA and LASSO regression.In the end, the six most relevant features were selected.The selected features and their corresponding coefficients are shown in Fig. 3. Then the AIScore is established by using the selected features and their coefficients.The AIScore showed a statistically significant difference between the HTN and non-HTN groups (Supplementary Fig. S2).www.nature.com/scientificreports/

Development of nomogram
In order to adapt to different clinical application scenarios, we established five different models based on different clinical and imaging features.
In clinical model, the method of backward stepwise logistic regression showed that age, height, weight, serum biomarkers (including cholesterol (CHOL), high-density lipoprotein cholesterol (HDL.C), low density lipoprotein cholesterol (LDL.C), creatinine (CREA), serum potassium (K) and accompanied diseases (including hyperlipidemia (hyperlip), peripheral artery atherosclerosis (peri_AS) and coronary atherosclerosis (con_AS) is the risk predictor of hypertension.Since age, height and weight can be obtained directly, we used these three features to build the BasicClinical model (Supplementary Fig. S3A), and combined them and serum markers as well as concomitant diseases to establish the TotalClinical model (Supplementary Fig. S3B).Subsequently, we used multiple logistic regression to construct an AIMeasure model utilizing the six features of thoracic aorta screened previously.Finally, both clinical features and AIScore were included in multiple stepwise backward  S3C) and AITotalClinical model (Supplementary Fig. S3D).
The nomograms of the five models are shown in Supplementary Fig. S3, and the selected valuable features and coefficients included in the clinical and mixed models are listed in Table 2.

Evaluation and comparison of performance of different models
The ROC curves of the five models in the training and validation sets were shown in Fig. 4, and the diagnostic performance was summarized in Table 3.The results presented that all the five models had good diagnostic performance for hypertension in both the training set (AUC 0.735-0.836,sensitivity 65.5-73.3%,specificity 66.7-72.5%)and the validation set (AUC 0.767-0.818,sensitivity 63.6-68.2%,specificity 70.9-77.3%).Subsequently, we compared the AUC among the five models (Table 4).In the training set, the AUC values of Total-Clinical Model and AITotalClinical model were statistically different from other models (P < 0.001), while in the validation set, there was no significant difference in AUC among the five models (P > 0.05).
Calibration curve and Hosmer-Lemeshow test showed that the five models presented good calibration ability in both the training set (P = 0.079-0.570)and the validation set (P = 0.117-0.977)(Fig. 6).The decision curve analysis of the five models was shown in Fig. 5, which showed that the five models could bring net benefits to patients in most reasonable threshold probability ranges (Fig. 6).

Discussion
This study develops five distinct hypertension risk prediction models customized for various clinical scenarios.The BasicClinical model incorporates readily available factors like age, height, and weight, making it applicable to a broad audience.Expanding on the BasicClinical model, the TotalClinical model integrates additional parameters, including blood glucose, blood lipids, electrolyte levels, comorbidities, and other clinician-assessed factorstypically gathered during physical examinations and prior medical visits.The AIMeasure model encompasses     Notably, in our initial stages of data processing, we also diligently applied the k-fold method, generating thousands of models and meticulously evaluating each one.We observed a considerable degree of stability in these results, prompting us to adopt the 7:3 random division method for the final analysis presented in the paper.The robustness of various sampling outcomes is satisfactory, and we attribute this to our ample sample size, particularly within the specific target population of individuals undergoing lung cancer screening.
Traditional risk factors for high blood pressure comprise age, BMI, smoking and others.Age serves as an independent predictor of hypertension due to diminished vascular elasticity, sluggish blood flow, and heightened blood viscosity 35 .Studies have yielded inconclusive findings regarding the association between hypertension and gender 9,36 .Within this study, women exhibited a slightly elevated hypertension incidence compared to men, potentially influenced by a greater female representation in the sample.A European study indicated a notably lower incidence of cardiovascular disease in women compared to men up to age 45, with no substantial variance in prevalence by age 60, potentially attributed to estrogen's protective impact on blood vessels 37 .Studies have shown that obesity, especially abdominal obesity, independently heightens the risk of hypertension 9,38,39 .As obesity rates surge, the incidence of high blood pressure escalates.Furthermore, smoking and alcohol consumption also contribute to an elevated hypertension risk 9,10,40,41 .
Biochemical markers obtained from routine medical check-ups and visits for other conditions provide valuable insights.This study focused on parameters such as blood glucose, lipid profiles, electrolytes and comorbidities.After rigorous variable screening, the model integrated cholesterol, high-density lipoprotein, low-density lipoprotein, blood potassium, creatinine, hyperlipidemia, and arteriosclerosis.Anomalies in glucose metabolism heighten hypertension risk by damaging blood vessels 14 .Concurrently, hyperlipidemia significantly amplifies  www.nature.com/scientificreports/hypertension risk, and the two often co-exist to accelerate arteriosclerosis 42 .The mechanism may involve the rise and fluctuation of blood pressure increasing stress on the vascular wall, lipid deposition thickening the intima, and stimulating the inflammatory response.This leads to injury to intima endothelial cells, increased permeability, fibrosis of media smooth muscle cells, increased arterial hardness, and decreased elasticity.The effect of electrolytes on hypertension is twofold.Lowering sodium intake and increasing potassium intake are known to be beneficial in reducing hypertension 43,44 , and adequate calcium intake is also advantageous for high blood pressure 45 .Most studies have demonstrated a protective effect of magnesium against hypertension 46 .The regulation of magnesium on blood pressure may include mechanisms such as acting as a calcium antagonist to regulate vascular tension and contraction, vascular endothelial function, aging and stiffness, vascular remodeling, oxidative stress, insulin resistance, inflammatory response, etc. Hypertension is a prevalent clinical syndrome with multiple contributing factors.The occurrence of hypertension is attributed to various risk factors and the decompensation of blood pressure regulation mechanism.Simultaneously, hypertension and various risk factors can mutually influence and collectively contribute to the progression and aggravation of the disease.Hypertension frequently coexists with various conditions, including diabetes, hyperlipidemia, atherosclerosis, cardiovascular diseases, and cerebrovascular diseases.They may interact in a complex causal manner, exacerbating their respective pathological processes 47,48 .
In a preliminary study 49 , we demonstrated that the diameter of the thoracic aorta, particularly the middle descending aorta, significantly impacted masked hypertension and poorly controlled outcomes of hypertension.In this study, we additionally measured the cross-sectional area, volume, and centerline length of the thoracic aorta, in addition to the diameter of nine levels.Our study revealed significantly differences in all diameters, areas, volumes, and lengths of the ascending and descending aorta between non-hypertensive and hypertensive groups.Following dimensionality reduction and variable selection, the prediction model ultimately includes D3, D4, D6, D7, A1 and V7_8.The AUC for the training set and validation set is 0.735 and 0.767, respectively.In a previous study, the diameters and volumes of ascending, arching, and descending segments of the thoracic aorta were larger in hypertensive patients than in subjects with normal blood pressure (P < 0.001), and the differences persisted after adjusting for age 50 .According to Laplace's law, the size of the vascular lumen is inversely proportional to the thickness of the wall and directly proportional to the pressure 51 .To maintain the stability of circumferential pressure under the condition of constant vascular thickness, an increase in blood pressure inevitably leads to an expansion of diameter.Vasodilation mediated by blood flow occurs when a sudden increase in blood flow in the lumen induces shear force on the vascular wall, resulting in damage to vascular endothelial cells and subsequent vasodilation 25,52 .
We developed five distinct models based on conventional risk factors, biochemical markers, co-morbidity, and thoracic aorta imaging factors of the thoracic aorta measured on non-enhanced chest CT.All five models exhibited good diagnostic performance (AUC 0.735-0.836),along with robust calibration capabilities and significant clinical net benefits.Framingham Heart Study developed a short-term hypertension prediction model considering the interaction between age, gender, SBP, DBP, current smokers, parental hypertension, BMI, age and DBP 9 under the premise of Caucasian patients without diabetes.However, it has been verified that the model's extensibility is limited 10,11 .In reality, many chronic diseases tend to co-occur, such as diabetes and hypertension.Therefore, not excluding individuals with diabetes in the study is more in line with the actual clinical scenario.Numerous hypertension prediction models have been developed in Asia, including China, but they often incorporate only partial risk factors and location-specific variables.As an example, a prediction model for Kazakh herdsmen in Xinjiang not only included age, body mass index, blood lipid, and other factors but also considered dietary factors (such as yak butter often consumed in pastoral areas).The model achieved an AUC of 0.803 in modeling set and 0.809 in verification set 12 .Leveraging genetic and environmental factors, Li et al. built prediction models for systolic and diastolic blood pressure, yielding AUC values of 0.673 and 0.817, respectively 17 .
In summary, we developed and validated five hypertension prediction models using distinct predictors.Primary care workers can select from different prediction models, tailoring hypertension predictions for individual patients based on the available predictors.For example, when patients undergo chest CT for lung cancer screening, the AIMeasure model can predict the risk of hypertension; Conversely, when patients present additional clinical data like major biochemical laboratory examination and past medical history, the AITotalClinical model emerges as a valuable tool for predicting hypertension risk.This will enhance strategies for preventing and treating hypertension, effectively reducing and delaying the onset of adverse events of related to hypertension.
Nevertheless, certain limitations in this study warrant consideration.First of all, the study focused on patients undergoing routine health examinations in the general medical department of our hospital, with an age range spanning from 18 to 95 years old.Consequently, the predictive capacity of the model in different ethnic groups or specific populations remains uncertain.Secondly, the study exclusively incorporated risk factors accessible through routine examinations, excluding other predictors like economic status, educational level, psychosocial elements, and genetic markers.Lastly, as this study is a single-center cross-sectional study, the sample represents only a portion of the population in this region.Since hypertension risk factors can vary across different regions, the generalizability of this study is somewhat constrained, and the model's stability requires additional external validation.

Conclusion
In this study, five hypertension risk prediction models were established based on clinical risk factors for hypertension and thoracic aorta image features measured on non-enhanced chest CT.These include the BasicClinical model, which relies on easily obtainable basic information; the TotalClinical model, which incorporates comprehensive clinical data such as basic information, biochemical indicators, and comorbidities; the AIMeasure model, focusing on thoracic aorta image features; the AIBasicClinical model, combining basic information

Figure 1 .
Figure 1.Flowchart illustrating the inclusion criteria and grouping of patients.

Figure 3 .
Figure 3.The weights of selected thoracic aorta features measured by AI.The numerical value represents a specific level of the thoracic aorta; D, A, V, and L denote the diameter, area, volume, and length of the thoracic aorta at a specific level or two adjacent levels.

Figure 4 .
Figure 4.The ROC curves for the five models in the training set (A) and validation set (B).

Figure 5 .
Figure 5. Decision curve analysis (DCA) curves for the five models in training set (A) and validation set (B).

Figure 6 .
Figure 6.Calibration curves for the five models in training set (A-E) and validation set (F-J).From left to right, the figures depict the AIMeasure model, BasicClinical model, TotalClinical model, AIBasicClinical model and AITotalClinical model, arranged with the training set at the top and the test set at the bottom.The dashed line represents the ideal prediction line.The red line illustrates the predictive efficacy of the nomogram in hypertension prediction.The green line indicates bias correction in the model.

Table 1 .
Comparison of clinical features between the HTN group and non-HTN group, as well as between the training set and validation set.Non-HTN non-Hypertension; HTN Hypertension; BMI body mass index; GLU fasting blood glucose; CHOL Cholesterol; TG Triglyceride; HDL.C high-density lipoprotein cholesterol; LDL.C low-density lipoprotein cholesterol; ALT alanine aminotransferase; AST aspartate aminotransferase; BUN blood urea nitrogen; CREA creatinine; Na serum sodium; K serum potassium; Ca serum calcium; Mg serum magnesium; CL serum chlorine; PHOS serum phosphorus; CO2 carbon dioxide; Hypergly Hyperglycemia; Hyperuri Hyperuricemia; Hyperlip Hyperlipidemia; Peri_AS peripheral arteriosclerosis; Con_AS coronary arteriosclerosis; CHD coronary heart disease; Cere_AS cerebral arteriosclerosis; OP osteoporosis; CKD chronic kidney disease; COPD chronic obstructive pulmonary disease.
logistic regression to construct two mixed models called AIBasicClinical Model (Supplementary Fig.

Table 2 .
The coefficients of features included in the two clinical models and two mixed models.The abbreviations align with those in Table1.
Vol.:(0123456789) Scientific Reports | (2024) 14:6862 | https://doi.org/10.1038/s41598-024-57396-1dimensions such as the diameter, cross-sectional area, volume and length of thoracic aorta.These metrics can be efficiently measured by AI using standard non-enhanced chest CT scans.Consequently, patients can simultaneously estimate hypertension risk during lung cancer screenings or pulmonary nodule follow-ups, amplifying the value of chest CT assessments.The AIBasicClinical model and the AITotalClinical model include the AIScore and the aforementioned clinical risk factors.Our hypertension risk prediction model exhibits robust calibration and substantial clinical utility.Physicians can anticipate hypertension risk based on established factors, allowing for effective preventative measures or treatment strategies.

Table 3 .
Diagnostic performance of the five models in training set and validation set.AUC area under the ROC curve.

Table 4 .
Comparison matrix of AUC for the five models in training set and validation set.In the bottom left corner, the chart illustrates the differences in AUC among the five models in the training set, while the italics in top right corner depicts the differences in AUC among the five models in the validation set.The bold in diagonal line represents self-comparisons of the five models.And Delong test was applied for all comparisons.